44 research outputs found

    Multiple Instance Curriculum Learning for Weakly Supervised Object Detection

    Full text link
    When supervising an object detector with weakly labeled data, most existing approaches are prone to trapping in the discriminative object parts, e.g., finding the face of a cat instead of the full body, due to lacking the supervision on the extent of full objects. To address this challenge, we incorporate object segmentation into the detector training, which guides the model to correctly localize the full objects. We propose the multiple instance curriculum learning (MICL) method, which injects curriculum learning (CL) into the multiple instance learning (MIL) framework. The MICL method starts by automatically picking the easy training examples, where the extent of the segmentation masks agree with detection bounding boxes. The training set is gradually expanded to include harder examples to train strong detectors that handle complex images. The proposed MICL method with segmentation in the loop outperforms the state-of-the-art weakly supervised object detectors by a substantial margin on the PASCAL VOC datasets.Comment: Published in BMVC 201

    Supervised descent method (SDM) applied to accurate pupil detection in off-the-shelf eye tracking systems

    Get PDF
    The precise detection of pupil/iris center is key to estimate gaze accurately. This fact becomes specially challenging in low cost frameworks in which the algorithms employed for high performance systems fail. In the last years an outstanding effort has been made in order to apply training-based methods to low resolution images. In this paper, Supervised Descent Method (SDM) is applied to GI4E database. The 2D landmarks employed for training are the corners of the eyes and the pupil centers. In order to validate the algorithm proposed, a cross validation procedure is performed. The strategy employed for the training allows us to affirm that our method can potentially outperform the state of the art algorithms applied to the same dataset in terms of 2D accuracy. The promising results encourage to carry on in the study of training-based methods for eye tracking.Spanish Ministry of Economy,Industry and Competitiveness, contracts TIN2014-52897-R and TIN2017-84388-

    Deep Thermal Imaging: Proximate Material Type Recognition in the Wild through Deep Learning of Spatial Surface Temperature Patterns

    Get PDF
    We introduce Deep Thermal Imaging, a new approach for close-range automatic recognition of materials to enhance the understanding of people and ubiquitous technologies of their proximal environment. Our approach uses a low-cost mobile thermal camera integrated into a smartphone to capture thermal textures. A deep neural network classifies these textures into material types. This approach works effectively without the need for ambient light sources or direct contact with materials. Furthermore, the use of a deep learning network removes the need to handcraft the set of features for different materials. We evaluated the performance of the system by training it to recognise 32 material types in both indoor and outdoor environments. Our approach produced recognition accuracies above 98% in 14,860 images of 15 indoor materials and above 89% in 26,584 images of 17 outdoor materials. We conclude by discussing its potentials for real-time use in HCI applications and future directions.Comment: Proceedings of the 2018 CHI Conference on Human Factors in Computing System

    Sharing information across object templates

    No full text
    Object detection is a central and challenging task in computer vision. In this thesis, we first examine the "big data" hypothesis: object detection might be solved with simple models backed with massive training data. We empirically show that the performance of one of the state-of-the-art methods (discriminatively trained HoG templates) tends to saturate fast when fed with more data. The required training data may need to grow exponentially in order to produce a fixed improvement in accuracy. We also find that the key difficulties in detection are large variation in object appearance and more importantly, that the variation exhibits a "long tail" distribution: there are many rare cases with little training data, which makes those cases hard to model. This thesis addresses such challenges by proposing new representations that share information within and across object subcategories. Sharing allows one to learn models for rare subcategories in the long-tail where traditional approaches suffer from lack of training data. We investigate two methods for sharing. We first examine global models that share entire training examples across multiple subcategories. For example, an SUV image might be used to train both a car and truck subcategory model. We also examine local sharing that share subwindows of training examples through "parts". For example, nearly all vehicles contain wheel parts. By mixing and matching (or composing) different parts together, one can implicitly encode an exponentially large set of subcategory models, which could even represent those subcategories not encountered in the training data.We extensively experiment and evaluate our models on different benchmarks, and show superior performance over the state-of-the-art. Finally, we conclude with a detailed analysis of local part sharing for face analysis, perhaps the most well studied of all object recognition problems. By using semantically-defined parts (such as eyes, nose, lips), one can simultaneously perform face detection, pose estimation, and landmark localization with state-of-the-art accuracy, with a single model

    Face Detection, Pose Estimation, and Landmark Localization in the Wild

    No full text
    We present a unified model for face detection, pose estimation, and landmark estimation in real-world, cluttered images. Our model is based on a mixtures of trees with a shared pool of parts; we model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint. We show that tree-structured models are surprisingly effective at capturing global elastic deformation, while being easy to optimize unlike dense graph structures. We present extensive results on standard face benchmarks, as well as a new “in the wild ” annotated dataset, that suggests our system advances the state-of-theart, sometimes considerably, for all three tasks. Though our model is modestly trained with hundreds of faces, it compares favorably to commercial systems trained with billions of examples (such as Google Picasa and face.com). 1

    Capturing Long-Tail Distributions of Object Subcategories

    No full text
    We argue that object subcategories follow a long-tail dis-tribution: a few subcategories are common, while many are rare. We describe distributed algorithms for learning large-mixture models that capture long-tail distributions, which are hard to model with current approaches. We introduce a generalized notion of mixtures (or subcategories) that allow for examples to be shared across multiple subcategories. We optimize our models with a discriminative clustering algo-rithm that searches over mixtures in a distributed, “brute-force ” fashion. We used our scalable system to train tens of thousands of deformable mixtures for VOC objects. We demonstrate significant performance improvements, partic-ularly for object classes that are characterized by large ap-pearance variation. 1
    corecore